Rutgers' HARD Track Experiences at TREC 2004

نویسندگان

  • Nicholas J. Belkin
  • I. Chaleva
  • Michael J. Cole
  • Y.-L. Li
  • Lu Liu
  • Ying-Hsang Liu
  • Gheorghe Muresan
  • Catherine L. Smith
  • Ying Sun
  • Xiaojun Yuan
  • Xiao-Min Zhang
چکیده

1 Introduction The goal of our work in the HARD track was to test techniques for using knowledge about various aspects of the information seeker's context to improve IR system performance. We were particularly concerned with such knowledge which could be gained through implicit sources of evidence, rather than explicit questioning of the information seeker. We therefore did not submit any clarification form 1 , preferring to rely on the categories of supplied metadata concerning the user which we believed could, at least in principle, be inferred from user behavior, either in the past or during the current information seeking episode. The experimental condition of the HARD track was for each site to submit at least one baseline run for the set of 50 topics, using only the title and (optionally) description fields for query construction. The results of the baseline run(s) were compared with the results from one or more experimental runs, which made use of the supplied searcher metadata, and of a clarification form submitted to the searcher, asking for whatever information each site thought would be useful in improving search results. We used only the supplied metadata, for the reasons stated above, and especially because we were interested in how to make initial queries better, rather than in how to conduct a dialogue with a searcher. There were five categories of searcher metadata for each topic (not all topics had values for all five): Genre, Familiarity, Geography, Granularity and Related text(s), which were intended to represent aspects of the searcher's context which might be useful in tailoring retrieval to the individual, and the individual situation. We made the assumption that at least some of these categories would be available to the IR system prior to (or in conjunction with) the specific search session, either through explicit or implicit evidence. Therefore, for us the HARD track experimental condition was designed to test whether knowledge of these contextual characteristics, and our specific ways of using that knowledge, would result in better retrieval performance than a good IR system without such knowledge. We understood that there would be, in general, two ways in which to take account of the metadata. One would be to modify the initial query from the (presumed) searcher, before submitting it for search; the other would be to search with the initial query, and then to modify (i.e. re-rank) the results before showing them to the …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ranking using Metadata

This was the first time that RGU had participated in the HARD track, and indeed in TREC. We were interested in investigating the effect of exploiting the topic metadata to re-rank our initial baseline run, in a similar fashion to that of Rutgers in TREC 2003 [Belkin et al, 2003]. We used the Lemur toolkit (LTK) to obtain a baseline ranking, using title and description for each topic, and using ...

متن کامل

Rutgers Interactive Track at TREC-5

The Interactive Track investigation at Rutgers concentrated primarily on three factors: the searchers’ uses and understandings of relevance feedback and ranked output, and the utility of relevance feedback for the interactive track task; the searchers’ understandings of the interactive track task; and performance differences based on topic characteristics and searcher and order effects. Our off...

متن کامل

The Robert Gordon University's HARD Track Experiments at TREC 2004

The High Accuracy Retrieval from Documents (HARD) track explores methods of improving the accuracy of document retrieval systems. As part of this track, the participants have investigated how information about a searcher’s context can be used to improve retrieval performance [Allan, 2003; Allan, 2004]. Searchers, referred to as assessors in this track, produce TREC-style search topics. Addition...

متن کامل

Amberfish at the TREC 2004 Terabyte Track

The TREC 2004 Terabyte Track evaluated information retrieval in largescale text collections, using a set of 25 million documents (426 GB). This paper gives an overview of our experiences with this collection and describes Amberfish, the text retrieval software used for the experiments.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004